Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells432446
Missing cells (%)8.1%8.3%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 90 (20.2%) missing values Age has 96 (21.5%) missing values Missing
Cabin has 341 (76.5%) missing values Cabin has 350 (78.5%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 308 (69.1%) zeros SibSp has 307 (68.8%) zeros Zeros
Parch has 347 (77.8%) zeros Parch has 342 (76.7%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 10 (2.2%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-09-12 08:02:43.8063182023-09-12 08:02:48.960807
Analysis finished2023-09-12 08:02:48.9589292023-09-12 08:02:53.110154
Duration5.15 seconds4.15 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean431.71076451.92601
 Dataset ADataset B
Minimum21
Maximum887891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:53.348228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum21
5-th percentile43.551.5
Q1196.25231.5
median428456.5
Q3652.5684.75
95-th percentile828853.75
Maximum887891
Range885890
Interquartile range (IQR)456.25453.25

Descriptive statistics

 Dataset ADataset B
Standard deviation255.28192256.96998
Coefficient of variation (CV)0.591326280.56861073
Kurtosis-1.1982817-1.1944421
Mean431.71076451.92601
Median Absolute Deviation (MAD)229227
Skewness0.0198603490.0072009949
Sum192543201559
Variance65168.85866033.57
MonotonicityNot monotonicNot monotonic
2023-09-12T09:02:53.691930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
147 1
 
0.2%
713 1
 
0.2%
777 1
 
0.2%
494 1
 
0.2%
849 1
 
0.2%
490 1
 
0.2%
410 1
 
0.2%
99 1
 
0.2%
192 1
 
0.2%
243 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
694 1
 
0.2%
355 1
 
0.2%
464 1
 
0.2%
11 1
 
0.2%
129 1
 
0.2%
555 1
 
0.2%
199 1
 
0.2%
184 1
 
0.2%
638 1
 
0.2%
8 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
15 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
13 1
0.2%
14 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
15 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
277 
1
169 
0
286 
1
160 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row00
3rd row00
4th row00
5th row10

Common Values

ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%

Length

2023-09-12T09:02:53.930214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:02:54.111504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:54.250565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%

Most occurring characters

ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
252 
1
105 
2
89 
3
253 
1
103 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row33
3rd row33
4th row23
5th row23

Common Values

ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 253
56.7%
1 103
23.1%
2 90
 
20.2%

Length

2023-09-12T09:02:54.388632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:02:54.580494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:54.742365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 253
56.7%
1 103
23.1%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 253
56.7%
1 103
23.1%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 253
56.7%
1 103
23.1%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 253
56.7%
1 103
23.1%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 252
56.5%
1 105
23.5%
2 89
 
20.0%
ValueCountFrequency (%)
3 253
56.7%
1 103
23.1%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:55.293520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8257
Median length4848
Mean length27.01345326.697309
Min length1512

Characters and Unicode

 Dataset ADataset B
Total characters1204811907
Distinct characters6059
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowAndersson, Mr. August Edvard ("Wennerstrom")Saad, Mr. Khalil
2nd rowCoelho, Mr. Domingos FernandeoBoulos, Miss. Nourelain
3rd rowGronnestad, Mr. Daniel DanielsenGallagher, Mr. Martin
4th rowJenkin, Mr. Stephen CurnowSivic, Mr. Husein
5th rowCaldwell, Master. Alden GatesHampe, Mr. Leon
ValueCountFrequency (%)
mr 256
 
14.1%
miss 84
 
4.6%
mrs 74
 
4.1%
william 29
 
1.6%
john 24
 
1.3%
henry 21
 
1.2%
master 20
 
1.1%
james 15
 
0.8%
charles 13
 
0.7%
thomas 12
 
0.7%
Other values (877) 1267
69.8%
ValueCountFrequency (%)
mr 271
 
15.1%
miss 91
 
5.1%
mrs 58
 
3.2%
william 28
 
1.6%
master 20
 
1.1%
henry 19
 
1.1%
james 15
 
0.8%
john 14
 
0.8%
george 12
 
0.7%
joseph 10
 
0.6%
Other values (904) 1256
70.0%
2023-09-12T09:02:56.070271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1370
 
11.4%
r 995
 
8.3%
e 865
 
7.2%
a 845
 
7.0%
s 653
 
5.4%
n 621
 
5.2%
i 620
 
5.1%
M 568
 
4.7%
l 549
 
4.6%
o 504
 
4.2%
Other values (50) 4458
37.0%
ValueCountFrequency (%)
1349
 
11.3%
r 976
 
8.2%
a 845
 
7.1%
e 832
 
7.0%
i 659
 
5.5%
s 649
 
5.5%
n 625
 
5.2%
M 572
 
4.8%
l 517
 
4.3%
o 484
 
4.1%
Other values (49) 4399
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7739
64.2%
Uppercase Letter 1829
 
15.2%
Space Separator 1370
 
11.4%
Other Punctuation 942
 
7.8%
Close Punctuation 80
 
0.7%
Open Punctuation 80
 
0.7%
Dash Punctuation 8
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7653
64.3%
Uppercase Letter 1807
 
15.2%
Space Separator 1349
 
11.3%
Other Punctuation 959
 
8.1%
Close Punctuation 66
 
0.6%
Open Punctuation 66
 
0.6%
Dash Punctuation 7
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1370
100.0%
ValueCountFrequency (%)
1349
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 995
12.9%
e 865
11.2%
a 845
10.9%
s 653
8.4%
n 621
8.0%
i 620
8.0%
l 549
 
7.1%
o 504
 
6.5%
t 337
 
4.4%
h 263
 
3.4%
Other values (16) 1487
19.2%
ValueCountFrequency (%)
r 976
12.8%
a 845
11.0%
e 832
10.9%
i 659
8.6%
s 649
8.5%
n 625
8.2%
l 517
 
6.8%
o 484
 
6.3%
t 335
 
4.4%
h 255
 
3.3%
Other values (16) 1476
19.3%
Uppercase Letter
ValueCountFrequency (%)
M 568
31.1%
J 120
 
6.6%
A 115
 
6.3%
H 95
 
5.2%
S 90
 
4.9%
C 89
 
4.9%
E 84
 
4.6%
B 80
 
4.4%
W 72
 
3.9%
R 57
 
3.1%
Other values (15) 459
25.1%
ValueCountFrequency (%)
M 572
31.7%
A 127
 
7.0%
H 108
 
6.0%
J 103
 
5.7%
S 88
 
4.9%
C 78
 
4.3%
E 72
 
4.0%
W 69
 
3.8%
B 69
 
3.8%
L 65
 
3.6%
Other values (15) 456
25.2%
Other Punctuation
ValueCountFrequency (%)
. 447
47.5%
, 446
47.3%
" 44
 
4.7%
' 4
 
0.4%
/ 1
 
0.1%
ValueCountFrequency (%)
, 446
46.5%
. 446
46.5%
" 62
 
6.5%
' 5
 
0.5%
Close Punctuation
ValueCountFrequency (%)
) 80
100.0%
ValueCountFrequency (%)
) 66
100.0%
Open Punctuation
ValueCountFrequency (%)
( 80
100.0%
ValueCountFrequency (%)
( 66
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
ValueCountFrequency (%)
- 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9568
79.4%
Common 2480
 
20.6%
ValueCountFrequency (%)
Latin 9460
79.4%
Common 2447
 
20.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1370
55.2%
. 447
 
18.0%
, 446
 
18.0%
) 80
 
3.2%
( 80
 
3.2%
" 44
 
1.8%
- 8
 
0.3%
' 4
 
0.2%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1349
55.1%
, 446
 
18.2%
. 446
 
18.2%
) 66
 
2.7%
( 66
 
2.7%
" 62
 
2.5%
- 7
 
0.3%
' 5
 
0.2%
Latin
ValueCountFrequency (%)
r 995
 
10.4%
e 865
 
9.0%
a 845
 
8.8%
s 653
 
6.8%
n 621
 
6.5%
i 620
 
6.5%
M 568
 
5.9%
l 549
 
5.7%
o 504
 
5.3%
t 337
 
3.5%
Other values (41) 3011
31.5%
ValueCountFrequency (%)
r 976
 
10.3%
a 845
 
8.9%
e 832
 
8.8%
i 659
 
7.0%
s 649
 
6.9%
n 625
 
6.6%
M 572
 
6.0%
l 517
 
5.5%
o 484
 
5.1%
t 335
 
3.5%
Other values (41) 2966
31.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12048
100.0%
ValueCountFrequency (%)
ASCII 11907
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1370
 
11.4%
r 995
 
8.3%
e 865
 
7.2%
a 845
 
7.0%
s 653
 
5.4%
n 621
 
5.2%
i 620
 
5.1%
M 568
 
4.7%
l 549
 
4.6%
o 504
 
4.2%
Other values (50) 4458
37.0%
ValueCountFrequency (%)
1349
 
11.3%
r 976
 
8.2%
a 845
 
7.1%
e 832
 
7.0%
i 659
 
5.5%
s 649
 
5.5%
n 625
 
5.2%
M 572
 
4.8%
l 517
 
4.3%
o 484
 
4.1%
Other values (49) 4399
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
285 
female
161 
male
297 
female
149 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72197314.6681614
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21062082
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalefemale
3rd rowmalemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 297
66.6%
female 149
33.4%

Length

2023-09-12T09:02:56.282695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:02:56.423061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:56.545172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 297
66.6%
female 149
33.4%

Most occurring characters

ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2106
100.0%
ValueCountFrequency (%)
Lowercase Letter 2082
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 2106
100.0%
ValueCountFrequency (%)
Latin 2082
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2106
100.0%
ValueCountFrequency (%)
ASCII 2082
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7575
Distinct (%)21.1%21.4%
Missing9096
Missing (%)20.2%21.5%
Infinite00
Infinite (%)0.0%0.0%
Mean30.91643330.117629
 Dataset ADataset B
Minimum0.420.75
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:56.867230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.75
5-th percentile46
Q12221
median2928
Q33939
95-th percentile5855.275
Maximum8071
Range79.5870.25
Interquartile range (IQR)1718

Descriptive statistics

 Dataset ADataset B
Standard deviation15.13520114.435373
Coefficient of variation (CV)0.489551990.47929978
Kurtosis0.26532849-0.16521907
Mean30.91643330.117629
Median Absolute Deviation (MAD)89
Skewness0.433106070.29207307
Sum11006.2510541.17
Variance229.07431208.37999
MonotonicityNot monotonicNot monotonic
2023-09-12T09:02:57.115041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 18
 
4.0%
30 14
 
3.1%
36 13
 
2.9%
25 13
 
2.9%
28 13
 
2.9%
29 12
 
2.7%
18 12
 
2.7%
26 11
 
2.5%
21 11
 
2.5%
24 10
 
2.2%
Other values (65) 229
51.3%
(Missing) 90
 
20.2%
ValueCountFrequency (%)
25 17
 
3.8%
28 14
 
3.1%
24 13
 
2.9%
27 13
 
2.9%
30 12
 
2.7%
35 10
 
2.2%
18 10
 
2.2%
22 10
 
2.2%
29 10
 
2.2%
34 9
 
2.0%
Other values (65) 232
52.0%
(Missing) 96
21.5%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
1 5
1.1%
2 7
1.6%
3 1
 
0.2%
4 4
0.9%
5 1
 
0.2%
6 2
 
0.4%
7 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 1
 
0.2%
4 4
0.9%
5 2
 
0.4%
6 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 1
 
0.2%
4 4
0.9%
5 2
 
0.4%
6 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.83 1
 
0.2%
1 5
1.1%
2 7
1.6%
3 1
 
0.2%
4 4
0.9%
5 1
 
0.2%
6 2
 
0.4%
7 1
 
0.2%
8 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.497757850.49103139
 Dataset ADataset B
Minimum00
Maximum88
Zeros308307
Zeros (%)69.1%68.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:57.291936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile2.752
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.07211621.0116891
Coefficient of variation (CV)2.15389112.0603349
Kurtosis18.47800516.776802
Mean0.497757850.49103139
Median Absolute Deviation (MAD)00
Skewness3.76459613.5037685
Sum222219
Variance1.14943321.0235149
MonotonicityNot monotonicNot monotonic
2023-09-12T09:02:57.437075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 308
69.1%
1 107
 
24.0%
4 9
 
2.0%
2 8
 
1.8%
3 8
 
1.8%
8 3
 
0.7%
5 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 105
 
23.5%
2 12
 
2.7%
4 10
 
2.2%
3 8
 
1.8%
8 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 308
69.1%
1 107
 
24.0%
2 8
 
1.8%
3 8
 
1.8%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 105
 
23.5%
2 12
 
2.7%
3 8
 
1.8%
4 10
 
2.2%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 307
68.8%
1 105
 
23.5%
2 12
 
2.7%
3 8
 
1.8%
4 10
 
2.2%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 308
69.1%
1 107
 
24.0%
2 8
 
1.8%
3 8
 
1.8%
4 9
 
2.0%
5 3
 
0.7%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.358744390.36995516
 Dataset ADataset B
Minimum00
Maximum56
Zeros347342
Zeros (%)77.8%76.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:57.577566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.791245490.82646859
Coefficient of variation (CV)2.20559682.2339697
Kurtosis9.607104312.524856
Mean0.358744390.36995516
Median Absolute Deviation (MAD)00
Skewness2.79589883.1311993
Sum160165
Variance0.626069430.68305034
MonotonicityNot monotonicNot monotonic
2023-09-12T09:02:57.714208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 347
77.8%
1 54
 
12.1%
2 37
 
8.3%
5 3
 
0.7%
3 3
 
0.7%
4 2
 
0.4%
ValueCountFrequency (%)
0 342
76.7%
1 65
 
14.6%
2 28
 
6.3%
3 5
 
1.1%
5 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 347
77.8%
1 54
 
12.1%
2 37
 
8.3%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
ValueCountFrequency (%)
0 342
76.7%
1 65
 
14.6%
2 28
 
6.3%
3 5
 
1.1%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 342
76.7%
1 65
 
14.6%
2 28
 
6.3%
3 5
 
1.1%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 347
77.8%
1 54
 
12.1%
2 37
 
8.3%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct386378
Distinct (%)86.5%84.8%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:58.182976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.68834086.793722
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29833030
Distinct characters3232
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique333326 ?
Unique (%)74.7%73.1%

Sample

 Dataset ADataset B
1st row3500432672
2nd rowSOTON/O.Q. 31013072678
3rd row847136864
4th rowC.A. 33111349251
5th row248738345769
ValueCountFrequency (%)
pc 36
 
6.3%
c.a 15
 
2.6%
a/5 8
 
1.4%
ca 7
 
1.2%
2 5
 
0.9%
ston/o 5
 
0.9%
382652 4
 
0.7%
f.c.c 4
 
0.7%
w./c 4
 
0.7%
a/4 3
 
0.5%
Other values (404) 477
84.0%
ValueCountFrequency (%)
pc 32
 
5.7%
a/5 10
 
1.8%
c.a 9
 
1.6%
soton/o.q 6
 
1.1%
ston/o 6
 
1.1%
2 6
 
1.1%
ca 6
 
1.1%
w./c 5
 
0.9%
ston/o2 5
 
0.9%
3101295 4
 
0.7%
Other values (396) 476
84.2%
2023-09-12T09:02:58.896487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 379
12.7%
1 314
10.5%
2 291
9.8%
7 267
9.0%
4 236
 
7.9%
6 205
 
6.9%
5 195
 
6.5%
0 188
 
6.3%
9 165
 
5.5%
8 146
 
4.9%
Other values (22) 597
20.0%
ValueCountFrequency (%)
3 376
12.4%
1 336
11.1%
2 292
9.6%
7 267
8.8%
4 250
8.3%
0 207
 
6.8%
6 189
 
6.2%
9 185
 
6.1%
5 179
 
5.9%
8 142
 
4.7%
Other values (22) 607
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2386
80.0%
Uppercase Letter 328
 
11.0%
Other Punctuation 142
 
4.8%
Space Separator 122
 
4.1%
Lowercase Letter 5
 
0.2%
ValueCountFrequency (%)
Decimal Number 2423
80.0%
Uppercase Letter 337
 
11.1%
Other Punctuation 143
 
4.7%
Space Separator 119
 
3.9%
Lowercase Letter 8
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 379
15.9%
1 314
13.2%
2 291
12.2%
7 267
11.2%
4 236
9.9%
6 205
8.6%
5 195
8.2%
0 188
7.9%
9 165
6.9%
8 146
 
6.1%
ValueCountFrequency (%)
3 376
15.5%
1 336
13.9%
2 292
12.1%
7 267
11.0%
4 250
10.3%
0 207
8.5%
6 189
7.8%
9 185
7.6%
5 179
7.4%
8 142
 
5.9%
Space Separator
ValueCountFrequency (%)
122
100.0%
ValueCountFrequency (%)
119
100.0%
Other Punctuation
ValueCountFrequency (%)
. 98
69.0%
/ 44
31.0%
ValueCountFrequency (%)
. 91
63.6%
/ 52
36.4%
Uppercase Letter
ValueCountFrequency (%)
C 82
25.0%
P 60
18.3%
O 45
13.7%
A 39
11.9%
S 31
 
9.5%
N 19
 
5.8%
T 16
 
4.9%
W 7
 
2.1%
Q 6
 
1.8%
I 6
 
1.8%
Other values (5) 17
 
5.2%
ValueCountFrequency (%)
C 72
21.4%
O 60
17.8%
P 47
13.9%
A 39
11.6%
S 38
11.3%
N 25
 
7.4%
T 23
 
6.8%
Q 10
 
3.0%
W 5
 
1.5%
I 5
 
1.5%
Other values (5) 13
 
3.9%
Lowercase Letter
ValueCountFrequency (%)
a 2
40.0%
r 1
20.0%
i 1
20.0%
s 1
20.0%
ValueCountFrequency (%)
a 2
25.0%
r 2
25.0%
i 2
25.0%
s 2
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2650
88.8%
Latin 333
 
11.2%
ValueCountFrequency (%)
Common 2685
88.6%
Latin 345
 
11.4%

Most frequent character per script

Common
ValueCountFrequency (%)
3 379
14.3%
1 314
11.8%
2 291
11.0%
7 267
10.1%
4 236
8.9%
6 205
7.7%
5 195
7.4%
0 188
7.1%
9 165
6.2%
8 146
 
5.5%
Other values (3) 264
10.0%
ValueCountFrequency (%)
3 376
14.0%
1 336
12.5%
2 292
10.9%
7 267
9.9%
4 250
9.3%
0 207
7.7%
6 189
7.0%
9 185
6.9%
5 179
6.7%
8 142
 
5.3%
Other values (3) 262
9.8%
Latin
ValueCountFrequency (%)
C 82
24.6%
P 60
18.0%
O 45
13.5%
A 39
11.7%
S 31
 
9.3%
N 19
 
5.7%
T 16
 
4.8%
W 7
 
2.1%
Q 6
 
1.8%
I 6
 
1.8%
Other values (9) 22
 
6.6%
ValueCountFrequency (%)
C 72
20.9%
O 60
17.4%
P 47
13.6%
A 39
11.3%
S 38
11.0%
N 25
 
7.2%
T 23
 
6.7%
Q 10
 
2.9%
W 5
 
1.4%
I 5
 
1.4%
Other values (9) 21
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2983
100.0%
ValueCountFrequency (%)
ASCII 3030
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 379
12.7%
1 314
10.5%
2 291
9.8%
7 267
9.0%
4 236
 
7.9%
6 205
 
6.9%
5 195
 
6.5%
0 188
 
6.3%
9 165
 
5.5%
8 146
 
4.9%
Other values (22) 597
20.0%
ValueCountFrequency (%)
3 376
12.4%
1 336
11.1%
2 292
9.6%
7 267
8.8%
4 250
8.3%
0 207
 
6.8%
6 189
 
6.2%
9 185
 
6.1%
5 179
 
5.9%
8 142
 
4.7%
Other values (22) 607
20.0%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct182181
Distinct (%)40.8%40.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean30.33748131.170748
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros610
Zeros (%)1.3%2.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:59.150514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.05105
Q17.89587.8958
median13.2513.5
Q329.5562530
95-th percentile90109.76873
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)21.6604522.1042

Descriptive statistics

 Dataset ADataset B
Standard deviation49.25964851.031205
Coefficient of variation (CV)1.62372241.6371505
Kurtosis44.35128838.185896
Mean30.33748131.170748
Median Absolute Deviation (MAD)66.275
Skewness5.62483055.2129469
Sum13530.51613902.154
Variance2426.51292604.1839
MonotonicityNot monotonicNot monotonic
2023-09-12T09:02:59.392897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 19
 
4.3%
7.8958 19
 
4.3%
8.05 18
 
4.0%
7.75 17
 
3.8%
10.5 13
 
2.9%
26 13
 
2.9%
7.2292 10
 
2.2%
7.8542 9
 
2.0%
8.6625 9
 
2.0%
7.925 8
 
1.8%
Other values (172) 311
69.7%
ValueCountFrequency (%)
7.8958 26
 
5.8%
8.05 23
 
5.2%
13 21
 
4.7%
26 17
 
3.8%
7.75 15
 
3.4%
0 10
 
2.2%
7.225 9
 
2.0%
7.925 9
 
2.0%
10.5 9
 
2.0%
26.55 9
 
2.0%
Other values (171) 298
66.8%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 3
 
0.7%
7.225 8
1.8%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 5
1.1%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 10
2.2%
4.0125 1
 
0.2%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 5
1.1%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 3
 
0.7%
7.225 8
1.8%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8385
Distinct (%)79.0%88.5%
Missing341350
Missing (%)76.5%78.5%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:02:59.839437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1511
Median length33
Mean length3.47619053.4375
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters365330
Distinct characters1919
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6374 ?
Unique (%)60.0%77.1%

Sample

 Dataset ADataset B
1st rowC78C22 C26
2nd rowD17B86
3rd rowD33B5
4th rowC93E38
5th rowC87E24
ValueCountFrequency (%)
g6 4
 
3.3%
f 3
 
2.5%
c65 2
 
1.7%
b22 2
 
1.7%
d17 2
 
1.7%
e101 2
 
1.7%
e121 2
 
1.7%
d33 2
 
1.7%
c93 2
 
1.7%
b98 2
 
1.7%
Other values (84) 98
81.0%
ValueCountFrequency (%)
c83 2
 
1.8%
c52 2
 
1.8%
f 2
 
1.8%
f2 2
 
1.8%
e101 2
 
1.8%
b5 2
 
1.8%
e121 2
 
1.8%
g6 2
 
1.8%
d26 2
 
1.8%
c26 2
 
1.8%
Other values (86) 89
81.7%
2023-09-12T09:03:00.459911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 37
 
10.1%
2 36
 
9.9%
B 32
 
8.8%
C 32
 
8.8%
6 26
 
7.1%
1 25
 
6.8%
5 24
 
6.6%
8 21
 
5.8%
E 19
 
5.2%
4 18
 
4.9%
Other values (9) 95
26.0%
ValueCountFrequency (%)
2 36
10.9%
C 35
 
10.6%
1 29
 
8.8%
B 26
 
7.9%
6 23
 
7.0%
5 21
 
6.4%
3 19
 
5.8%
E 18
 
5.5%
8 18
 
5.5%
4 17
 
5.2%
Other values (9) 88
26.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 228
62.5%
Uppercase Letter 121
33.2%
Space Separator 16
 
4.4%
ValueCountFrequency (%)
Decimal Number 208
63.0%
Uppercase Letter 109
33.0%
Space Separator 13
 
3.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 37
16.2%
2 36
15.8%
6 26
11.4%
1 25
11.0%
5 24
10.5%
8 21
9.2%
4 18
7.9%
9 17
7.5%
7 14
 
6.1%
0 10
 
4.4%
ValueCountFrequency (%)
2 36
17.3%
1 29
13.9%
6 23
11.1%
5 21
10.1%
3 19
9.1%
8 18
8.7%
4 17
8.2%
0 17
8.2%
9 16
7.7%
7 12
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
B 32
26.4%
C 32
26.4%
E 19
15.7%
D 17
14.0%
F 7
 
5.8%
A 7
 
5.8%
G 6
 
5.0%
T 1
 
0.8%
ValueCountFrequency (%)
C 35
32.1%
B 26
23.9%
E 18
16.5%
D 16
14.7%
F 5
 
4.6%
A 5
 
4.6%
G 3
 
2.8%
T 1
 
0.9%
Space Separator
ValueCountFrequency (%)
16
100.0%
ValueCountFrequency (%)
13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 244
66.8%
Latin 121
33.2%
ValueCountFrequency (%)
Common 221
67.0%
Latin 109
33.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 37
15.2%
2 36
14.8%
6 26
10.7%
1 25
10.2%
5 24
9.8%
8 21
8.6%
4 18
7.4%
9 17
7.0%
16
6.6%
7 14
 
5.7%
ValueCountFrequency (%)
2 36
16.3%
1 29
13.1%
6 23
10.4%
5 21
9.5%
3 19
8.6%
8 18
8.1%
4 17
7.7%
0 17
7.7%
9 16
7.2%
13
 
5.9%
Latin
ValueCountFrequency (%)
B 32
26.4%
C 32
26.4%
E 19
15.7%
D 17
14.0%
F 7
 
5.8%
A 7
 
5.8%
G 6
 
5.0%
T 1
 
0.8%
ValueCountFrequency (%)
C 35
32.1%
B 26
23.9%
E 18
16.5%
D 16
14.7%
F 5
 
4.6%
A 5
 
4.6%
G 3
 
2.8%
T 1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 365
100.0%
ValueCountFrequency (%)
ASCII 330
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 37
 
10.1%
2 36
 
9.9%
B 32
 
8.8%
C 32
 
8.8%
6 26
 
7.1%
1 25
 
6.8%
5 24
 
6.6%
8 21
 
5.8%
E 19
 
5.2%
4 18
 
4.9%
Other values (9) 95
26.0%
ValueCountFrequency (%)
2 36
10.9%
C 35
 
10.6%
1 29
 
8.8%
B 26
 
7.9%
6 23
 
7.0%
5 21
 
6.4%
3 19
 
5.8%
E 18
 
5.5%
8 18
 
5.5%
4 17
 
5.2%
Other values (9) 88
26.7%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
311 
C
90 
Q
44 
S
322 
C
85 
Q
39 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowSC
3rd rowSQ
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 311
69.7%
C 90
 
20.2%
Q 44
 
9.9%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 322
72.2%
C 85
 
19.1%
Q 39
 
8.7%

Length

2023-09-12T09:03:00.653436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:03:00.781423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:03:00.907380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
s 311
69.9%
c 90
 
20.2%
q 44
 
9.9%
ValueCountFrequency (%)
s 322
72.2%
c 85
 
19.1%
q 39
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S 311
69.9%
C 90
 
20.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 322
72.2%
C 85
 
19.1%
Q 39
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 446
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 311
69.9%
C 90
 
20.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 322
72.2%
C 85
 
19.1%
Q 39
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 311
69.9%
C 90
 
20.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 322
72.2%
C 85
 
19.1%
Q 39
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 311
69.9%
C 90
 
20.2%
Q 44
 
9.9%
ValueCountFrequency (%)
S 322
72.2%
C 85
 
19.1%
Q 39
 
8.7%

Interactions

Dataset A

2023-09-12T09:02:47.818775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.915624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.237539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:49.316438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.846147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:49.952106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.471887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.550459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.220583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.197160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.937279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:52.023458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.348192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:49.420034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.981237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.060361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.601784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.672052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.334540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.312276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:48.066070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:52.144407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.473671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:49.558164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.106906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.180767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.821832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.799090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.453456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.436351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:48.193247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:52.279786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.597717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:49.683004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.227256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.292589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.958879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.931904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.585669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.566033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:48.306322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:52.421648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:45.714576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:49.815755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:46.344139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:50.420911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.089884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.058355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

2023-09-12T09:02:47.701745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:02:51.800843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

Dataset A

2023-09-12T09:03:01.008436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset B

2023-09-12T09:03:01.176414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.097-0.0430.0250.0280.1050.0000.0720.000
Age0.0971.000-0.186-0.2390.1610.1460.2610.0000.111
SibSp-0.043-0.1861.0000.4550.4590.1980.1330.2230.096
Parch0.025-0.2390.4551.0000.3850.0990.0000.2530.041
Fare0.0280.1610.4590.3851.0000.3420.4950.2270.172
Survived0.1050.1460.1980.0990.3421.0000.3710.5420.109
Pclass0.0000.2610.1330.0000.4950.3711.0000.1150.256
Sex0.0720.0000.2230.2530.2270.5420.1151.0000.040
Embarked0.0000.1110.0960.0410.1720.1090.2560.0401.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.034-0.084-0.018-0.0370.0130.0000.0340.000
Age0.0341.000-0.265-0.2760.1340.1510.3010.0000.089
SibSp-0.084-0.2651.0000.4480.4050.1170.0960.1640.000
Parch-0.018-0.2760.4481.0000.4190.1200.0000.2250.000
Fare-0.0370.1340.4050.4191.0000.2740.4710.1570.190
Survived0.0130.1510.1170.1200.2741.0000.2740.5140.161
Pclass0.0000.3010.0960.0000.4710.2741.0000.0000.244
Sex0.0340.0000.1640.2250.1570.5140.0001.0000.104
Embarked0.0000.0890.0000.0000.1900.1610.2440.1041.000

Missing values

Dataset A

2023-09-12T09:02:48.483137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-09-12T09:02:52.614316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-09-12T09:02:48.705802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-09-12T09:02:52.874667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-09-12T09:02:48.886665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-09-12T09:02:53.043705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
14614713Andersson, Mr. August Edvard ("Wennerstrom")male27.00003500437.7958NaNS
13113203Coelho, Mr. Domingos Fernandeomale20.0000SOTON/O.Q. 31013077.0500NaNS
76977003Gronnestad, Mr. Daniel Danielsenmale32.000084718.3625NaNS
707102Jenkin, Mr. Stephen Curnowmale32.0000C.A. 3311110.5000NaNS
787912Caldwell, Master. Alden Gatesmale0.830224873829.0000NaNS
42842903Flynn, Mr. JamesmaleNaN003648517.7500NaNQ
32232312Slayter, Miss. Hilda Maryfemale30.000023481812.3500NaNQ
43243312Louch, Mrs. Charles Alexander (Alice Adelaide Slow)female42.0010SC/AH 308526.0000NaNS
78478503Ali, Mr. Williammale25.0000SOTON/O.Q. 31013127.0500NaNS
41241311Minahan, Miss. Daisy Efemale33.00101992890.0000C78Q

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
69369403Saad, Mr. Khalilmale25.00026727.2250NaNC
85285303Boulos, Miss. Nourelainfemale9.011267815.2458NaNC
70370403Gallagher, Mr. Martinmale25.000368647.7417NaNQ
56156203Sivic, Mr. Huseinmale40.0003492517.8958NaNS
44144203Hampe, Mr. Leonmale20.0003457699.5000NaNS
81881903Holm, Mr. John Fredrik Alexandermale43.000C 70756.4500NaNS
79779813Osman, Mrs. Marafemale31.0003492448.6833NaNS
383903Vander Planke, Miss. Augusta Mariafemale18.02034576418.0000NaNS
35735802Funk, Miss. Annie Clemmerfemale38.00023767113.0000NaNS
59559603Van Impe, Mr. Jean Baptistemale36.01134577324.1500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
18818903Bourke, Mr. Johnmale40.01136484915.5000NaNQ
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
22222303Green, Mr. George Henrymale51.000214408.0500NaNS
25425503Rosblom, Mrs. Viktor (Helena Wilhelmina)female41.00237012920.2125NaNS
31731802Moraweck, Dr. Ernestmale54.0002901114.0000NaNS
11211303Barton, Mr. David Johnmale22.0003246698.0500NaNS
34734813Davison, Mrs. Thomas Henry (Mary E Finck)femaleNaN1038652516.1000NaNS
71171201Klaber, Mr. HermanmaleNaN0011302826.5500C124S
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
585912West, Miss. Constance Miriumfemale5.012C.A. 3465127.7500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
46546603Goncalves, Mr. Manuel Estanslasmale38.000SOTON/O.Q. 31013067.0500NaNS
25125203Strom, Mrs. Wilhelm (Elna Matilda Persson)female29.01134705410.4625G6S
56356403Simmons, Mr. JohnmaleNaN00SOTON/OQ 3920828.0500NaNS
47647702Renouf, Mr. Peter Henrymale34.0103102721.0000NaNS
55055111Thayer, Mr. John Borland Jrmale17.00217421110.8833C70C
53753811LeRoy, Miss. Berthafemale30.000PC 17761106.4250NaNC
55755801Robbins, Mr. VictormaleNaN00PC 17757227.5250NaNC
49249301Molson, Mr. Harry Marklandmale55.00011378730.5000C30S
79179202Gaskell, Mr. Alfredmale16.00023986526.0000NaNS
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.